X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8CD83.BA07B930@onstor-exch02.onstor.net>; Fri, 13 Jun 2008 11:31:42 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Proposed design for new(ish) boot procedure for Cougar
Date: Fri, 13 Jun 2008 11:31:42 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AF9@onstor-exch02.onstor.net>
In-Reply-To: <20080613113037.3f79c040@ripper.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Proposed design for new(ish) boot procedure for Cougar
Thread-Index: AcjNg5PTLb5XMPixRz2BNCPJKYQHmQAAB/zw
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AE4@onstor-exch02.onstor.net><BB375AF679D4A34E9CA8DFA650E2B04E03E9A8FB@onstor-exch02.onstor.net> <20080613113037.3f79c040@ripper.onstor.net>
From: "Narayan Venkat" <narayan.venkat@onstor.com>
To: "Andy Sharp" <andy.sharp@onstor.com>,
	"Chris Vandever" <chris.vandever@onstor.com>
Cc: "Ian Brown" <ian.brown@onstor.com>,
	"dl-Design Review" <dl-designreview@onstor.com>,
	"Brian Stark" <brian.stark@onstor.com>,
	"Warren Gale" <warren.gale@onstor.com>

That's the description of End of Life!

Narayan Venkat
Vice President, Marketing
ONStor Inc. (www.onstor.com)
Tel: (408) 963-2404
Cell: (408) 221-4297

-----Original Message-----
From: Andy Sharp=20
Sent: Friday, June 13, 2008 11:31 AM
To: Chris Vandever
Cc: Ian Brown; dl-Design Review; Brian Stark; Warren Gale
Subject: Re: Proposed design for new(ish) boot procedure for Cougar

On Fri, 13 Jun 2008 11:15:34 -0700 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> You'll have to try harder than that.  Jobi has to restart his SSC
> daemons because he's actually trying to use his cheetah as a filer.
> However, if you have no clients and only care about the ssc daemons,
> well, that's another story...

What if you have no clients and you don't care about SSC daemons either?

> -----Original Message-----
> From: Maxim Kozlovsky=20
> Sent: Friday, June 13, 2008 11:12 AM
> To: Jobi Ariyamannil; Andy Sharp; Ian Brown
> Cc: dl-Design Review; Brian Stark; Warren Gale
> Subject: RE: Proposed design for new(ish) boot procedure for Cougar
>=20
> Oh well. This must be a part of the conspiracy to make Chris give up
> her Cheetah.=20
>=20
> >-----Original Message-----
> >From: Jobi Ariyamannil
> >Sent: Friday, June 13, 2008 10:57 AM
> >To: Maxim Kozlovsky; Andy Sharp; Ian Brown
> >Cc: dl-Design Review; Brian Stark; Warren Gale
> >Subject: RE: Proposed design for new(ish) boot procedure for Cougar
> >
> >This does not work on cheetah anymore.
> >We need to manually restart a bunch of SSC daemons after resetting
> >the
> fp.
> >
> >-----Original Message-----
> >From: Maxim Kozlovsky
> >Sent: Friday, June 13, 2008 9:28 AM
> >To: Andy Sharp; Ian Brown
> >Cc: dl-Design Review; Brian Stark; Warren Gale
> >Subject: RE: Proposed design for new(ish) boot procedure for Cougar
> >
> >
> >
> >>-----Original Message-----
> >>From: Andy Sharp
> >>Sent: Thursday, June 12, 2008 8:29 PM
> >>To: Ian Brown
> >>Cc: dl-Design Review; Brian Stark; Warren Gale
> >>Subject: Re: Proposed design for new(ish) boot procedure for Cougar
> >>
> >>On Thu, 12 Jun 2008 18:34:00 -0700 Ian Brown <ian.brown@onstor.com>
> >>wrote:
> >>
> >>> In production, for the Cheetah, we have always rebooted the entire
> >>> box.  There were some daemons that relied on boot up order, thus
> >>> I'd guess that you would need to restart the daemons in phase 1 if
> >>> you're going to just bounce an embedded core.
> >>
> >>That's good to know.  What little I know about Cheetah operation
> >>would likely fall into the "Lore" category.
> >>
> >>Phase I is still rebooting the whole box.  Depending on the results
> >>of testing, Phase II may never see the light of day. ~:^)
> >[MK]
> >
> >There is no need to restart the daemons. During cheetah development
> >the daemons which did care about fp/txrx/fc restarts learned to
> >listen on a slot/cpu up/down events and do the right thing. This
> >used to work up to 3.2, after that I had to give up my cheetah and
> >can't testify on the account.
> >
> >>
> >>
> >>> Ian
> >>>
> >>> On Jun 12, 2008, at 6:24 PM, Andrew Sharp wrote:
> >>>
> >>>                        Cougar Boot Procedure Redesign
> >>>                        ______________________________
> >>>
> >>> Problem
> >>> =3D=3D=3D=3D=3D=3D=3D
> >>>
> >>>     Booting takes far too long on Cougar, and in theory the
> >>> embedded nodes should be rebootable w/o rebooting Linux on the
> >>> Sibyte
> 1125.
> >>>
> >>> Reasons:
> >>>     1)    Image load from CF is intolerably slow
> >>>     2)    After image load, Linux boot takes the longest but is
> >>> the least likely to need rebooting, resulting in an
> unnecessary
> >>> 		  bottleneck.
> >>>
> >>> Solution
> >>> =3D=3D=3D=3D=3D=3D=3D=3D
> >>>
> >>>     Redesign the boot flow to allow the embedded cores to be
> >>>     independently booted if Linux is up.
> >>>
> >>> Proposal
> >>> =3D=3D=3D=3D=3D=3D=3D=3D
> >>>
> >>>     Take a phased approach to implementing a redesigned boot
> >>> procedure:
> >>>
> >>> 	Phase I
> >>> 	-------
> >>> 	1)  Change SSC PROM to load and boot only Linux.
> >>> 	2)  Change FP/TXRX PROM to write a magic cookie in a
> >>> 	    predefined memory location indicating its readiness
> >>> 	    for it's image to be loaded.
> >>> 	3)  Impement an early start Linux daemon that waits for
> >>> these boot magic cookies to be set by the embedded cores, loads
> >>> 	    their images to the correct memory locations, and
> >>> signals to the FP/TXRX when finished.  The FP and TXRX could boot
> >>>             while Linux completes its boot steps.
> >>>
> >>> 	Phase 2
> >>> 	-------
> >>> 	1)  Through testing, determine what needs to be done to
> >>> allow FP/TXRX to be rebooted independently without disturbing
> >>> the Linux kernel and each other.  Current daemons that
> >>>             communicate with FP/TXRX are not expected to be much
> >>> trouble since they had to handle this for Cheetah, although this
> >>> has not been extensively tested on Cheetah in the last few
> >>>             releases.
> >>>
> >>> Expected Results
> >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>>
> >>> Phase I
> >>> -------
> >>>
> >>> Current boot time           Predicted Boot time        Predicted
> >>> savings -----------------           -------------------
> >>> ----------------- 2 minutes, 57 secs          1 minute, 43.7
> >>> secs        1 minute, 13.7 secs
> >>>
> >>> 42% reduction in boot time: current boot time* is 2:57, resulting
> boot
> >>> time is estimated to be 1:43.7, or, a savings of 1:13.7, or, the
> >>> new method would boot 1.7 times faster (2 times faster, or twice
> >>> as
> fast,
> >>> would be a 50% reduction in boot time).
> >>>
> >>> These estimations based on a difference in image load time for the
> >>> FP/TXRX of 86 seconds for the PROM, and 12.7 seconds for Linux
> >>> (cold cache).
> >>>
> >>>
> >>> Phase II
> >>> --------
> >>> If just rebooting one or both of the FP/TXRX nodes, boot time
> >>> estimated to be in the sub 10 second range.  This would
> substantially
> >>> increase customer satisfaction and supportability, as well as
> >>> resulting in a substantial increase in developer efficiency.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> * Boot time measured from when PROM code starts loading the first
> boot
> >>> image to when nfxsh CLI is available.
> >>>
